Modelling the Retrieval of Structured Documents Containing Texts and Images

نویسندگان

  • Carlo Meghini
  • Fabrizio Sebastiani
  • Umberto Straccia
چکیده

We present a model for complex documents possibly consisting of a hierarchically structured set of images or texts. Documents are represented both at the form level (as sets of physical features of the representing objects), at the content level (as sets of properties of the represented entities), and at the structure level. A uniform and powerful query language allows queries to be issued that transparently combine features pertaining to form, content and structure alike. Queries are expressions of a (fuzzy) logical language. While that part of the query that pertains to (medium-independent) content is “directly” processed by an inferential engine, that part that pertains to (medium-dependent) form is entrusted to specialised document processing procedures linked to the logical language by a procedural attachment mechanism. The model thus combines the power of state-of-the-art document processing techniques with the advantages of a clean, logically defined framework for understanding multimedia document retrieval.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی نقش انواع بافتار هم‌نویسه‌ها در تعیین شباهت بین مدارک

Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...

متن کامل

Modelling Multimedia Structured Documents: A Retrieval Oriented Approach

We describe in this paper the modelling of multimedia structured documents according to the potential ways to retrieve them. We consider that the works already done on such types of documents do not focus on this point enough. So, our model is based on views that reeect the potential ways of \seeing" multimedia documents. The semantic content of a document is one way of seeing documents. So, an...

متن کامل

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...

متن کامل

ارائه روشی برای استخراج کلمات کلیدی و وزن‌دهی کلمات برای بهبود طبقه‌بندی متون فارسی

Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...

متن کامل

How Are Searching and Reading Intertwined during Retrieval from Hierarchically Structured Documents?

$EVWUDFW: Effective use of information retrieval systems requires that users know when to – temporarily – cease searching to do some reading and where to start reading. In hierarchically structured documents, users can to some extent interchange searching and reading by entering the text at different levels in the structure. Based on an experiment where 83 subjects solved 20 tasks each, we find...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997